The Relationship Between Income and Life Expectancy
Final Project
1 Library Imports
2 Data
2.1 Variables
2.2 Daily Mean Income Household Per Capita
GapMinder compiled data on daily mean household income per capita to analyze income distributions over hundreds of years. These figures are anchored in the official Mean Income indicator from the World Bank, derived from household surveys. For countries lacking World Bank data, GapMinder estimated mean income based on GDP per capita. The available time frame for actual World Bank data spans from 1967 to 2021, though most countries have limited data points within these years.
GapMinder used growth rates of constant dollar GDP per capita to estimate mean incomes historically from 1800 and project them up to 2100. For the period 1981-2019, they relied on World Bank data, known for its comprehensive coverage, published in the World Development Indicators as “Survey mean consumption or income per capita, total population (2017 PPP dollars per day).” The indicator they reference is described in the World Bank Poverty and Inequality Platform (PIP) as “Indicators Survey mean/average consumption or income per capita, total population (2017 PPP dollars per day). The average consumption is unimportant to our project but is collected in tandem by the World Bank with Income Per Capita. The mean represents the average monthly household per capita income or consumption expenditure from the survey in 2017 PPP.”
In short, the average daily income is the mean daily household per capita income or consumption expenditure from the survey, expressed in 2017 constant international dollars.
2.2.1 Life Expectancy
GapMinder collects life expectancy data from various sources to create a comprehensive dataset spanning from 1800 to 2100. Life expectancy at birth refers to the average number of years a newborn is expected to live, assuming that current mortality rates remain constant throughout their lifetime.
For the period from 1800 to 1970, GapMinder relies on its own compiled data (version 7), which includes information from over 100 sources and accounts for historical events causing significant mortality dips. From 1950 to 2019, data is primarily sourced from the Global Burden of Disease Study 2019 by the Institute for Health Metrics and Evaluation (IHME). This source provides detailed annual estimates. For projections from 2020 to 2100, GapMinder uses forecasts from the United Nations’ World Population Prospects 2022. The data is carefully combined, prioritizing IHME data when available, and extending IHME series with UN estimates for future projections.
2.3 Hypothesized Relationship Between the Variables
Higher average daily income is positively associated with higher life expectancy at birth.
country X1800 X1801 X1802 X1803 X1804 X1805 X1806 X1807 X1808
1 character double double double double double double double double double
X1809 X1810 X1811 X1812 X1813 X1814 X1815 X1816 X1817 X1818 X1819
1 double double double double double double double double double double double
X1820 X1821 X1822 X1823 X1824 X1825 X1826 X1827 X1828 X1829 X1830
1 double double double double double double double double double double double
X1831 X1832 X1833 X1834 X1835 X1836 X1837 X1838 X1839 X1840 X1841
1 double double double double double double double double double double double
X1842 X1843 X1844 X1845 X1846 X1847 X1848 X1849 X1850 X1851 X1852
1 double double double double double double double double double double double
X1853 X1854 X1855 X1856 X1857 X1858 X1859 X1860 X1861 X1862 X1863
1 double double double double double double double double double double double
X1864 X1865 X1866 X1867 X1868 X1869 X1870 X1871 X1872 X1873 X1874
1 double double double double double double double double double double double
X1875 X1876 X1877 X1878 X1879 X1880 X1881 X1882 X1883 X1884 X1885
1 double double double double double double double double double double double
X1886 X1887 X1888 X1889 X1890 X1891 X1892 X1893 X1894 X1895 X1896
1 double double double double double double double double double double double
X1897 X1898 X1899 X1900 X1901 X1902 X1903 X1904 X1905 X1906 X1907
1 double double double double double double double double double double double
X1908 X1909 X1910 X1911 X1912 X1913 X1914 X1915 X1916 X1917 X1918
1 double double double double double double double double double double double
X1919 X1920 X1921 X1922 X1923 X1924 X1925 X1926 X1927 X1928 X1929
1 double double double double double double double double double double double
X1930 X1931 X1932 X1933 X1934 X1935 X1936 X1937 X1938 X1939 X1940
1 double double double double double double double double double double double
X1941 X1942 X1943 X1944 X1945 X1946 X1947 X1948 X1949 X1950 X1951
1 double double double double double double double double double double double
X1952 X1953 X1954 X1955 X1956 X1957 X1958 X1959 X1960 X1961 X1962
1 double double double double double double double double double double double
X1963 X1964 X1965 X1966 X1967 X1968 X1969 X1970 X1971 X1972 X1973
1 double double double double double double double double double double double
X1974 X1975 X1976 X1977 X1978 X1979 X1980 X1981 X1982 X1983 X1984
1 double double double double double double double double double double double
X1985 X1986 X1987 X1988 X1989 X1990 X1991 X1992 X1993 X1994 X1995
1 double double double double double double double double double double double
X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004 X2005 X2006
1 double double double double double double double double double double double
X2007 X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015 X2016 X2017
1 double double double double double double double double double double double
X2018 X2019 X2020 X2021 X2022 X2023 X2024 X2025 X2026 X2027 X2028
1 double double double double double double double double double double double
X2029 X2030 X2031 X2032 X2033 X2034 X2035 X2036 X2037 X2038 X2039
1 double double double double double double double double double double double
X2040 X2041 X2042 X2043 X2044 X2045 X2046 X2047 X2048 X2049 X2050
1 double double double double double double double double double double double
X2051 X2052 X2053 X2054 X2055 X2056 X2057 X2058 X2059 X2060 X2061
1 double double double double double double double double double double double
X2062 X2063 X2064 X2065 X2066 X2067 X2068 X2069 X2070 X2071 X2072
1 double double double double double double double double double double double
X2073 X2074 X2075 X2076 X2077 X2078 X2079 X2080 X2081 X2082 X2083
1 double double double double double double double double double double double
X2084 X2085 X2086 X2087 X2088 X2089 X2090 X2091 X2092 X2093 X2094
1 double double double double double double double double double double double
X2095 X2096 X2097 X2098 X2099 X2100
1 double double double double double double
country X1800 X1801 X1802 X1803 X1804 X1805 X1806 X1807 X1808
1 character double double double double double double double double double
X1809 X1810 X1811 X1812 X1813 X1814 X1815 X1816 X1817 X1818 X1819
1 double double double double double double double double double double double
X1820 X1821 X1822 X1823 X1824 X1825 X1826 X1827 X1828 X1829 X1830
1 double double double double double double double double double double double
X1831 X1832 X1833 X1834 X1835 X1836 X1837 X1838 X1839 X1840 X1841
1 double double double double double double double double double double double
X1842 X1843 X1844 X1845 X1846 X1847 X1848 X1849 X1850 X1851 X1852
1 double double double double double double double double double double double
X1853 X1854 X1855 X1856 X1857 X1858 X1859 X1860 X1861 X1862 X1863
1 double double double double double double double double double double double
X1864 X1865 X1866 X1867 X1868 X1869 X1870 X1871 X1872 X1873 X1874
1 double double double double double double double double double double double
X1875 X1876 X1877 X1878 X1879 X1880 X1881 X1882 X1883 X1884 X1885
1 double double double double double double double double double double double
X1886 X1887 X1888 X1889 X1890 X1891 X1892 X1893 X1894 X1895 X1896
1 double double double double double double double double double double double
X1897 X1898 X1899 X1900 X1901 X1902 X1903 X1904 X1905 X1906 X1907
1 double double double double double double double double double double double
X1908 X1909 X1910 X1911 X1912 X1913 X1914 X1915 X1916 X1917 X1918
1 double double double double double double double double double double double
X1919 X1920 X1921 X1922 X1923 X1924 X1925 X1926 X1927 X1928 X1929
1 double double double double double double double double double double double
X1930 X1931 X1932 X1933 X1934 X1935 X1936 X1937 X1938 X1939 X1940
1 double double double double double double double double double double double
X1941 X1942 X1943 X1944 X1945 X1946 X1947 X1948 X1949 X1950 X1951
1 double double double double double double double double double double double
X1952 X1953 X1954 X1955 X1956 X1957 X1958 X1959 X1960 X1961 X1962
1 double double double double double double double double double double double
X1963 X1964 X1965 X1966 X1967 X1968 X1969 X1970 X1971 X1972 X1973
1 double double double double double double double double double double double
X1974 X1975 X1976 X1977 X1978 X1979 X1980 X1981 X1982 X1983 X1984
1 double double double double double double double double double double double
X1985 X1986 X1987 X1988 X1989 X1990 X1991 X1992 X1993 X1994 X1995
1 double double double double double double double double double double double
X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004 X2005 X2006
1 double double double double double double double double double double double
X2007 X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015 X2016 X2017
1 double double double double double double double double double double double
X2018 X2019 X2020 X2021 X2022 X2023 X2024 X2025 X2026 X2027 X2028
1 double double double double double double double double double double double
X2029 X2030 X2031 X2032 X2033 X2034 X2035 X2036 X2037 X2038 X2039
1 double double double double double double double double double double double
X2040 X2041 X2042 X2043 X2044 X2045 X2046 X2047 X2048 X2049 X2050
1 double double double double double double double double double double double
X2051 X2052 X2053 X2054 X2055 X2056 X2057 X2058 X2059 X2060 X2061
1 double double double double double double double double double double double
X2062 X2063 X2064 X2065 X2066 X2067 X2068 X2069 X2070 X2071 X2072
1 double double double double double double double double double double double
X2073 X2074 X2075 X2076 X2077 X2078 X2079 X2080 X2081 X2082 X2083
1 double double double double double double double double double double double
X2084 X2085 X2086 X2087 X2088 X2089 X2090 X2091 X2092 X2093 X2094
1 double double double double double double double double double double double
X2095 X2096 X2097 X2098 X2099 X2100
1 double double double double double double
2.4 How the Data was Cleaned
To the clean the data, we looked at the data types of the values and saw that all the numbers were of the type, character, despite having their class be numeric. To clean this, we mutated each year’s column to be a numeric type.
The year names initially had an X in front of the name when the data was first loaded. We chose to remove this naming convention after pivoting the data so that we can easily reference the years when graphing our data.
Instead of eliminating NA values in average, those values were left so that when joining the data, we can make a decision which years or countries to pick based on data that overlaps between the data frames.
2.5 How the Data was Pivoted
Next, we pivoted the data by country to separate each year into individual observations. For each country and year, we now have the corresponding average daily income and average life expectancy.
2.6 How the Data was Joined
In order to create one data table, we must join our two data sets that were cleaned and pivoted. One way we can do this is through an inner join, which will also handle and missing data by dropping it.
In addition to joining the data, the name of the “country” column was capitalized in order to have uniformity among the variable names.
3 Linear Regression
3.1 Exploring the Relationship Between the Two Variables
The variables to be explored are the average daily income in relation to the average life expectancy. The relationship to be explored is how the income effects the life expectancy.
The explanatory variable is the average income and the response variable is the average life expectancy.
To explore the relationship overtime
3.2 Linear Regression
3.2.1 Steps to Choosing Regression Features
Linear regression was simplified by taking the year 2010. The reason for this is because daily income and life expectancy have shown significant changes over the centuries, making it challenging to capture the full extent of these trends in a single regression model.
Historical data from the 1800s to the present day illustrates substantial shifts in both daily income and life expectancy, reflecting changes in economic, social, and healthcare systems globally.
By selecting the year 2010 as a reference point, we aim to focus on a period that represents a modern snapshot of these trends. Here’s why 2010 is a good choice:
. Representative Modern Era: 2010 serves as a representative point in the modern era, offering insights into contemporary socioeconomic and health conditions across countries.
. Mitigation of Predicted Data: The decision to exclude years beyond 2010 accounts for the absence of actual data and instead focuses on observed trends. This approach prevents potential biases introduced by predicted data, particularly in later years beyond the data collection timeframe.
. Adequate Time for Analysis: With 14 years having passed since 2010, this timeframe provides sufficient data for analysis while minimizing the impact of short-term fluctuations that may occur within smaller time intervals.
By anchoring our analysis to the year 2010, we aim to capture meaningful trends in daily income and life expectancy while ensuring the reliability and relevance of our linear regression model.
3.2.2 Regression Code
Call:
lm(formula = life_expectancy_2010^5 ~ daily_income_2010, data = average_data_years)
Coefficients:
(Intercept) daily_income_2010
1.282e+09 3.487e+07
The linear regression formula is \(\hat{y} = 3.487\times10^7+1.282\times10^9x\) where \(x\) is the daily income in 2010 and \(y\) is the life expectancy in 2010.
3.2.3 Interpretation of coefficients:
Intercept (\(3.47\times10^7\)): The intercept term represents the estimated life expectancy in the year 2010 when daily income is zero. By taking the 5th root of the intercept, we find that the intercept is 30.31 years. However, this interpretation may not be practically meaningful since daily income cannot be zero. It is more relevant to interpret the intercept as the life expectancy when daily income is at its lowest observed value in the data set.
Daily Income Coefficient (\(1.82\times10^9\)): The coefficient of daily income indicates the estimated change in life expectancy for a one-unit increase in daily income. By taking the 5th root of the coefficient for daily income in 2010, we find that for each 1 unit increase in daily income, there is an increase in life expectancy of about 10.31 years.
These interpretations provide insights into the relationship between daily income and life expectancy in the year 2010, as captured by the estimated regression model.
3.3 Model Fit
| Total Variance | Fitted Variance | Residual Variance |
|---|---|---|
| 75.7859 | 5.409526e+17 | 4.566445e+17 |
The total variance in the model is 75.79. Of the total variance, 31.70 is explained by the model, which leads us to an \(R^2\) of 41.82%. The remaining 44.08 in the total variance is unexplained.
Based on the \(R^2\) of 41.82%, the model quality is moderate to poor. The majority of the variability in life expectancy is not explained by daily income.